Goto

Collaborating Authors

 bias dimension


Spot the BlindSpots: Systematic Identification and Quantification of Fine-Grained LLM Biases in Contact Center Summaries

Mayilvaghanan, Kawin, Gupta, Siddhant, Kumar, Ayush

arXiv.org Artificial Intelligence

Abstractive summarization is a core application in contact centers, where Large Language Models (LLMs) generate millions of summaries of call transcripts daily. Despite their apparent quality, it remains unclear whether LLMs systematically under- or over-attend to specific aspects of the transcript, potentially introducing biases in the generated summary. While prior work has examined social and positional biases, the specific forms of bias pertinent to contact center operations - which we term Operational Bias - have remained unexplored. To address this gap, we introduce BlindSpot, a framework built upon a taxonomy of 15 operational bias dimensions (e.g., disfluency, speaker, topic) for the identification and quantification of these biases. BlindSpot leverages an LLM as a zero-shot classifier to derive categorical distributions for each bias dimension in a pair of transcript and its summary. The bias is then quantified using two metrics: Fidelity Gap (the JS Divergence between distributions) and Coverage (the percentage of source labels omitted). Using BlindSpot, we conducted an empirical study with 2500 real call transcripts and their summaries generated by 20 LLMs of varying scales and families (e.g., GPT, Llama, Claude). Our analysis reveals that biases are systemic and present across all evaluated models, regardless of size or family.


VIGNETTE: Socially Grounded Bias Evaluation for Vision-Language Models

Raj, Chahat, Wei, Bowen, Caliskan, Aylin, Anastasopoulos, Antonios, Zhu, Ziwei

arXiv.org Artificial Intelligence

While bias in large language models (LLMs) is well-studied, similar concerns in vision-language models (VLMs) have received comparatively less attention. Existing VLM bias studies often focus on portrait-style images and gender-occupation associations, overlooking broader and more complex social stereotypes and their implied harm. This work introduces VIGNETTE, a large-scale VQA benchmark with 30M+ images for evaluating bias in VLMs through a question-answering framework spanning four directions: factuality, perception, stereotyping, and decision making. Beyond narrowly-centered studies, we assess how VLMs interpret identities in contextualized settings, revealing how models make trait and capability assumptions and exhibit patterns of discrimination. Drawing from social psychology, we examine how VLMs connect visual identity cues to trait and role-based inferences, encoding social hierarchies, through biased selections. Our findings uncover subtle, multifaceted, and surprising stereotypical patterns, offering insights into how VLMs construct social meaning from inputs.


Bias Similarity Across Large Language Models

Jeong, Hyejun, Ma, Shiqing, Houmansadr, Amir

arXiv.org Artificial Intelligence

Bias in machine learning models has been a chronic problem, especially as these models influence decision-making in human society. In generative AI, such as Large Language Models, the impact of bias is even more profound compared to the classification models. LLMs produce realistic and human-like content that users may unconsciously trust, which could perpetuate harmful stereotypes to the uncontrolled public. It becomes particularly concerning when utilized in journalism or education. While prior studies have explored and quantified bias in individual AI models, no work has yet compared bias similarity across different LLMs. To fill this gap, we take a comprehensive look at ten open- and closed-source LLMs from four model families, assessing the extent of biases through output distribution. Using two datasets-one containing 4k questions and another with one million questions for each of the four bias dimensions -- we measure functional similarity to understand how biases manifest across models. Our findings reveal that 1) fine-tuning does not significantly alter output distributions, which would limit its ability to mitigate bias, 2) LLMs within the same family tree do not produce similar output distributions, implying that addressing bias in one model could have limited implications for others in the same family, and 3) there is a possible risk of training data information leakage, raising concerns about privacy and data security. Our analysis provides insight into LLM behavior and highlights potential risks in real-world deployment.


Intertwined Biases Across Social Media Spheres: Unpacking Correlations in Media Bias Dimensions

Liu, Yifan, Li, Yike, Wang, Dong

arXiv.org Artificial Intelligence

Media bias significantly shapes public perception by reinforcing stereotypes and exacerbating societal divisions. Prior research has often focused on isolated media bias dimensions such as \textit{political bias} or \textit{racial bias}, neglecting the complex interrelationships among various bias dimensions across different topic domains. Moreover, we observe that models trained on existing media bias benchmarks fail to generalize effectively on recent social media posts, particularly in certain bias identification tasks. This shortfall primarily arises because these benchmarks do not adequately reflect the rapidly evolving nature of social media content, which is characterized by shifting user behaviors and emerging trends. In response to these limitations, our research introduces a novel dataset collected from YouTube and Reddit over the past five years. Our dataset includes automated annotations for YouTube content across a broad spectrum of bias dimensions, such as gender, racial, and political biases, as well as hate speech, among others. It spans diverse domains including politics, sports, healthcare, education, and entertainment, reflecting the complex interplay of biases across different societal sectors. Through comprehensive statistical analysis, we identify significant differences in bias expression patterns and intra-domain bias correlations across these domains. By utilizing our understanding of the correlations among various bias dimensions, we lay the groundwork for creating advanced systems capable of detecting multiple biases simultaneously. Overall, our dataset advances the field of media bias identification, contributing to the development of tools that promote fairer media consumption. The comprehensive awareness of existing media bias fosters more ethical journalism, promotes cultural sensitivity, and supports a more informed and equitable public discourse.


Breaking Bias, Building Bridges: Evaluation and Mitigation of Social Biases in LLMs via Contact Hypothesis

Raj, Chahat, Mukherjee, Anjishnu, Caliskan, Aylin, Anastasopoulos, Antonios, Zhu, Ziwei

arXiv.org Artificial Intelligence

Large Language Models (LLMs) perpetuate social biases, reflecting prejudices in their training data and reinforcing societal stereotypes and inequalities. Our work explores the potential of the Contact Hypothesis, a concept from social psychology for debiasing LLMs. We simulate various forms of social contact through LLM prompting to measure their influence on the model's biases, mirroring how intergroup interactions can reduce prejudices in social contexts. We create a dataset of 108,000 prompts following a principled approach replicating social contact to measure biases in three LLMs (LLaMA 2, Tulu, and NousHermes) across 13 social bias dimensions. We propose a unique debiasing technique, Social Contact Debiasing (SCD), that instruction-tunes these models with unbiased responses to prompts. Our research demonstrates that LLM responses exhibit social biases when subject to contact probing, but more importantly, these biases can be significantly reduced by up to 40% in 1 epoch of instruction tuning LLaMA 2 following our SCD strategy. Our code and data are available at https://github.com/chahatraj/breakingbias.


BiasDora: Exploring Hidden Biased Associations in Vision-Language Models

Raj, Chahat, Mukherjee, Anjishnu, Caliskan, Aylin, Anastasopoulos, Antonios, Zhu, Ziwei

arXiv.org Artificial Intelligence

Existing works examining Vision Language Models (VLMs) for social biases predominantly focus on a limited set of documented bias associations, such as gender:profession or race:crime. This narrow scope often overlooks a vast range of unexamined implicit associations, restricting the identification and, hence, mitigation of such biases. We address this gap by probing VLMs to (1) uncover hidden, implicit associations across 9 bias dimensions. We systematically explore diverse input and output modalities and (2) demonstrate how biased associations vary in their negativity, toxicity, and extremity. Our work (3) identifies subtle and extreme biases that are typically not recognized by existing methodologies. We make the Dataset of retrieved associations, (Dora), publicly available here https://github.com/chahatraj/BiasDora.


Towards Region-aware Bias Evaluation Metrics

Borah, Angana, Garimella, Aparna, Mihalcea, Rada

arXiv.org Artificial Intelligence

When exposed to human-generated data, language models are known to learn and amplify societal biases. While previous works introduced benchmarks that can be used to assess the bias in these models, they rely on assumptions that may not be universally true. For instance, a gender bias dimension commonly used by these metrics is that of family--career, but this may not be the only common bias in certain regions of the world. In this paper, we identify topical differences in gender bias across different regions and propose a region-aware bottom-up approach for bias assessment. Our proposed approach uses gender-aligned topics for a given region and identifies gender bias dimensions in the form of topic pairs that are likely to capture gender societal biases. Several of our proposed bias topic pairs are on par with human perception of gender biases in these regions in comparison to the existing ones, and we also identify new pairs that are more aligned than the existing ones. In addition, we use our region-aware bias topic pairs in a Word Embedding Association Test (WEAT)-based evaluation metric to test for gender biases across different regions in different data domains. We also find that LLMs have a higher alignment to bias pairs for highly-represented regions showing the importance of region-aware bias evaluation metric.


Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models

Cai, Yuzhu, Yin, Sheng, Wei, Yuxi, Xu, Chenxin, Mao, Weibo, Juefei-Xu, Felix, Chen, Siheng, Wang, Yanfeng

arXiv.org Artificial Intelligence

The burgeoning landscape of text-to-image models, exemplified by innovations such as Midjourney and DALLE 3, has revolutionized content creation across diverse sectors. However, these advancements bring forth critical ethical concerns, particularly with the misuse of open-source models to generate content that violates societal norms. Addressing this, we introduce Ethical-Lens, a framework designed to facilitate the value-aligned usage of text-to-image tools without necessitating internal model revision. Ethical-Lens ensures value alignment in text-to-image models across toxicity and bias dimensions by refining user commands and rectifying model outputs. Systematic evaluation metrics, combining GPT4-V, HEIM, and FairFace scores, assess alignment capability. Our experiments reveal that Ethical-Lens enhances alignment capabilities to levels comparable with or superior to commercial models like DALLE 3, ensuring user-generated content adheres to ethical standards while maintaining image quality. This study indicates the potential of Ethical-Lens to ensure the sustainable development of open-source text-to-image tools and their beneficial integration into society. Our code is available at https://github.com/yuzhu-cai/Ethical-Lens.


MAFIA: Multi-Adapter Fused Inclusive LanguAge Models

Jain, Prachi, Sathe, Ashutosh, Gumma, Varun, Ahuja, Kabir, Sitaram, Sunayana

arXiv.org Artificial Intelligence

Pretrained Language Models (PLMs) are widely used in NLP for various tasks. Recent studies have identified various biases that such models exhibit and have proposed methods to correct these biases. However, most of the works address a limited set of bias dimensions independently such as gender, race, or religion. Moreover, the methods typically involve finetuning the full model to maintain the performance on the downstream task. In this work, we aim to modularly debias a pretrained language model across multiple dimensions. Previous works extensively explored debiasing PLMs using limited US-centric counterfactual data augmentation (CDA). We use structured knowledge and a large generative model to build a diverse CDA across multiple bias dimensions in a semi-automated way. We highlight how existing debiasing methods do not consider interactions between multiple societal biases and propose a debiasing model that exploits the synergy amongst various societal biases and enables multi-bias debiasing simultaneously. An extensive evaluation on multiple tasks and languages demonstrates the efficacy of our approach.


Large Language Model (LLM) Bias Index -- LLMBI

Oketunji, Abiodun Finbarrs, Anas, Muhammad, Saina, Deepthi

arXiv.org Artificial Intelligence

The Large Language Model Bias Index (LLMBI) is a pioneering approach designed to quantify and address biases inherent in large language models (LLMs), such as GPT-4. We recognise the increasing prevalence and impact of LLMs across diverse sectors. This research introduces a novel metric, LLMBI, to systematically measure and mitigate biases potentially skewing model responses. We formulated LLMBI using a composite scoring system incorporating multiple dimensions of bias, including but not limited to age, gender, and racial biases. To operationalise this metric, we engaged in a multi-step process involving collecting and annotating LLM responses, applying sophisticated Natural Language Processing (NLP) techniques for bias detection, and computing the LLMBI score through a specially crafted mathematical formula. The formula integrates weighted averages of various bias dimensions, a penalty for dataset diversity deficiencies, and a correction for sentiment biases. Our empirical analysis, conducted using responses from OpenAI's API, employs advanced sentiment analysis as a representative method for bias detection. The research reveals LLMs, whilst demonstrating impressive capabilities in text generation, exhibit varying degrees of bias across different dimensions. LLMBI provides a quantifiable measure to compare biases across models and over time, offering a vital tool for systems engineers, researchers and regulators in enhancing the fairness and reliability of LLMs. It highlights the potential of LLMs in mimicking unbiased human-like responses. Additionally, it underscores the necessity of continuously monitoring and recalibrating such models to align with evolving societal norms and ethical standards.